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1 Introduction 

Searching for fast algorithms to solve certain problems is a central and difficult task 
in computer science. Positive results usually come from explicit constructions of efficient 
algorithms for specific problem classes. Levin's algorithm is one of the few general purpose 
speed-ups. Within a (large) factor, it is the fastest algorithm to invert a function g:Y ^ 
X, if g is easy to evaluate |[Lev73| , [Lev84|| . Given x, an inversion algorithm p tries to 



find a y with g{y) =x by evaluating g on a. trial sequence yt G Y. Levin search runs all 
such algorithms p in parallel with relative computation time 2"'*^^^; i.e. a time fraction 
2-Kp) is devoted to execute p, where l{p) is the length of program p (coded binary). The 
total computation time to find a solution (if one exists) is bounded by 2''^^^ -timep, where 
p is any program of length l{p) finding a solution in time timCp. Hence, Levin search 
is optimal within a multiplicative constant in computation time. It can be modified to 
handle time-limited optimization problems as well [^ol86|| . Many, but not all problems 
are of inversion or optimization type. The matrix multiplication example, considered in 
the next section, for instance, cannot be brought into this form. Furthermore, the large 
factor 2^^P'> somewhat limits the applicability. 

A wider class of problems can be phrased in the following way. Given a formal spec- 
ification of a problem depending on some parameter x & X, we are interested in a fast 
algorithm computing solution yEY . This means that we are interested in a fast algorithm 
computing f:X-^Y, where / is a formal specification of the problem. For function inver- 
sion problems, f:=g^^. Ideally, we would like to have the fastest algorithm, maybe apart 
from some small constant factor in computation time. Unfortunately, Blum's Speed-up 
Theorem ||Blu67| , |Blu71|| shows that there are problems for which an (incomputable) se- 



quence of speed-improving algorithms (of increasing size) exists, but no fastest algorithm, 
however. 

In the approach presented here, we consider only those algorithms which provably solve 
a given problem, and have a fast (i.e. quickly computable) time bound. Neither the pro- 
grams themselves, nor the proofs need to be known in advance. Under these constraints we 
construct the asymptotically fastest algorithm save a factor of 5 that solves any formally 
defined problem /. 



Theorem 1. Let p* be a given algorithm computing p* {x) from x, or, more generally, a 
specification of a function. Let p be any algorithm, computing provably the same function 
as p* with computation time provably bounded by the function tp{x) for all x. timetp{x) 
is the time needed to compute the time bound tp{x). Then the algorithm Mp* constructed 
in Section Q computes p*{x) in time 

timeMp,{x) < 5-tp{x) + dp-timetp{x) + Cp 

with constants Cp and dp depending on p but not on x. Neither p, tp, nor the proofs need 
to be known in advance for the construction of Mp* (x) . 
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Known time bounds for practical problems can often be computed very quickly, i.e. 
timetp{x) /timep{x) often converges very quickly to zero. Furthermore, from a practi- 
cal point of view, the provability restrictions are often rather weak. Hence, we have 
constructed for every problem a solution, which is asymptotically only a factor 5 slower 
than the (provably) fastest algorithm! There is no large multiplicative factor, as in Levin's 
algorithm, and the problems are not restricted to inversion problems. What somewhat 
spoils the practical applicability of Mp* is the large additive constant Cp, which will be 
estimated in Section ^. 

An interesting consequence of Theorem 1, discussed in Section is that the fastest pro- 
gram that computes a certain function is also one of the shortest programs that provably 
computes this function. Looking for larger programs saves, at most, a finite number of 
computation steps, but cannot improve the time order. 

In Section ^ we elucidate the theorem and the range of applicability on several examples. 
In Section ^ we give formal definitions of the expressions time, proof, compute, etc., which 
occur in Theorem 1, and define the fast algorithm Mp*. The central idea is to enumerate 
all programs p equivalent to p* by enumerating all proofs. In Section ^ we analyze the 
algorithm Mp*, especially its computation time, prove Theorem 1, and give upper bounds 
for the constants Cp and dp. Subtleties regarding the underlying machine model are briefly 
discussed in Section |^. In Section |] we show that the fastest program computing a certain 
function is also one of the shortest programs provably computing this function. For this 
purpose, we extend the definition of the Kolmogorov complexity of a string and define two 
new natural measures for the complexity of functions and programs. Section ^ outlines 
generalizations of Theorem 1 to i/o streams and other time- measures. Conclusions are 
given in Section |^. 



2 Applicability 

To illustrate Theorem 1, we consider the problem of multiplying two n x n matrices. If 
p* is the standard algorithm for multiplying two matrice^ xG-R^'^xi?" " of size /(a;)~n^, 
then tp*{x) -.= 2^? upper bounds the true computation time timep*{x) = n^{2n — 1). We 
know there exists an algorithm p' for matrix multiplication with timep>{x) < tp/{x) : = 
c-n^'^^ |[Str69|| . The time-bound function (cast to an integer) can, as in many cases, be 



computed very fast, timet = 0{log^n). Hence, using Theorem 1, also Mp* is fast, 
timeMp* (x) = bcn^'^^ +0{log^n) . Of course, Mp* would be of no real use if p' is already the 
fastest program, since p' is known and could be used directly. We do not know however, 
whether there is an algorithm p" with timep»{x) < d-nHogn, for instance. But if it does 
exist, timeMp*{x) < bd-n'^log n+0{l) for all x is guaranteed. The matrix multiplication 
example has been chosen for specific reasons. First, it is not an inversion or optimization 
problem, hence unsuitable for Levin search. Second, although matrix multiplication is a 
very important and time-consuming issue, p' is not used in practice, since c is so large 

"'^Instead of interpreting R as the set of real numbers one might take the field 1F2 = {0, 1} to avoid 
subtleties arising from large numbers. Arithmetic operations are assumed to need one unit of time. 
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that for all practically occurring n, the cubic algorithm is faster. The same is true for Cp 
and dp, but we must admit that although c is large, the bounds we obtain for Cp and dp are 
tremendous. On the other hand, even Levin search, which has a tremendous multiplicative 
factor, has been successfully applied |pch97| , |SZW971] , when handled with care. The same 
should hold for Theorem 1, as will be discussed. We avoid the 0{) notation as far as 
possible, as it can be severely misleading (e.g. 10^^ = 0(1)). 

An obvious time bound for p is the actual computation time itself. An obvious algorithm 
to compute timep{x) is to count the number of steps needed for computing p{x). Hence, 
inserting tp = timep into Theorem 1 and using timetimep{x) <timep{x), we see that the 
computation time of Mp* is optimal within a multiplicative constant {dp + 5) and an 
additive constant Cp. The result is weaker than the one in Theorem 1, but no assumption 
on the computability of time bounds had to be made. 

When do we trust a fast algorithm to solve a problem? At least for well specified problems, 
like satisfiability, solving a combinatoric puzzle, computing the digits of vr, we usually 
invent algorithms, prove that they solve the problem and in many cases also can prove 
good and fast time bounds. In these cases, the provability assumptions in Theorem 1 
are no real restriction. The same holds for approximate algorithms which guarantee a 
precision e within a known time bound (many numerical algorithms are of this kind). For 
exact/approximate programs provably computing/converging to the right answer (e.g. 
traveling salesman problem, and also many numerical programs), but for which no good, 
and easy to compute time bound exists, Mp^, is only optimal apart from a huge constant 
factor 5 + dp in time, as discussed above. A precursor of algorithm Mp* for this case, in 
a special setting can be found in [[HutOCI|| . For poorly specified problems. Theorem 1 does 
not help at all. 



3 The Fast Algorithm 

The idea of the algorithm Mp* is to enumerate proofs of increasing length in some formal 
axiomatic system. If a proof actually proves that p and p* are functionally equivalent and 
p has time bound tp, add {p, tp) to a list L. The program p in L with the currently smallest 
time bound tp{x) is executed. By construction, the result p{x) is identical to p*{x). The 
trick to achieve the time bound stated in Theorem 1, is to schedule everything in a proper 
way, in order not to lose too much performance by computing slow p's and tp's before the 
p has been found. 

To avoid confusion, we formally define p and tp to be binary strings. That is, p is neither 
a program nor a function, but can be informally interpreted as such. A formal definition 
of the interpretations of p is given below. We say "p computes function f", when a 
universal reference Turing machine U on input {p,x) computes f{x) for all x. This is 
denoted by U{p, x) = f{x). To be able to talk about proofs, we need a formal logic system 
(y,\,yi,Ci,fi,Ri,^,A,=,...), and axioms, and inference rules. A proof is a sequence of 
formulas, where each formula is either an axiom or inferred from previous formulas in the 
sequence by applying the inference rules. We only need to know that provability, Turing 
Machines, and computation time can be formalized: 
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1. The set of (correct) proofs is enumerable. 

2. A term u can be defined such that the formula [^y:u{p,y) = u{p*,y)] is true if and 
only if U{p,x) = U{p*,x) for all x, i.e. if p and p* describe the same function. 

3. A term tm can be defined such that the formula [tm{p, x) =n] is true if, and only if 
the computation time of U on {p,x) is n, i.e. if n — timep{x) . 

We say that p is provably equivalent to p* if the formula [Vy : u{p, y) — u(p*, y)] can be 
proved. 

Mp* starts three algorithms A, B, and C running in parallel. 

Algorithm Mp*(x) 

Initialize the shared variables L := {}, tfast := oo, Pfast '■ = p* ■ 
Start algorithms A, B, and C in parallel with 10%, 10% and 80% 

computational resources, respectively. 

That is, C performs 8 steps when A and B perform 1 step each. 

Algorithm A 

for i:=l,2,3,... do 

pick the i*^ proof in the list of all proofs and 
isolate the last formula in the proof. 

if this formula is equal to \\/y : u{p, y) = u{p*, y) A u{t, y) > tm{p, y)] 
for some strings p and t, 
then add (p, t) to L. 
next i 

Algorithm B 

for aU {p,t)eL 

run U on all (t, x) in parallel for all t with relative computational resources 2"^*^^^"'^*^. 
if U halts for some t and U{t,x) <t fasti 
then tfast := U{t, x) and pfast P- 
continue (p, 

Algorithm C 

for k:=l,2,4,8,16,32,... do 

pick the currently fastest program p :— pfast with time bound tfast- 
run U on (p, for A: steps, 
if U halts in less than k steps, 

then print result U {p, x) and abort computation of A, B and C. 
continue A;. 

Note that A and B only terminate when aborted by C. The discussion of the algorithm(s) 
in the following sections clarifies details and proves Theorem 1. 
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4 Time Analysis 



Henceforth we return to the convenient abbreviations p{x) := U{p, x) and tp{x) : = U{tp, x). 
Let p' be some fixed algorithm that is provably equivalent to p*, with computation time 
timepi provably bounded by tpi. Let l{proof{p')) be the length of the binary coding 
of the, for instance, shortest proof. Computation time always refers to true overall 
computation time, whereas computation steps refer to instruction steps, steps = a -time, 
if a percentage a of computation time is assigned to an algorithm. 

A) To write down (not to invent!) a proof requires 0{l{proof)) steps. To check whether 
the sequence of formulas constitutes a valid proof requires 0{l{proof)^) steps. There are 
less than 2'+^ proofs of length <l. Algorithm A receives a = 10% of relative computation 
time. Hence, for a proof of {p',tp>) to occur, and for {p',tpi) to be added to L, needs, 
at most, time Ta < ■ 2'(p™°^(p'))+i ■ 0(/(proo/(p')^)- Note that the same program p 
can and will be accompanied by different time bounds tp, for instance [p, timep) will occur. 

B) The time assignment of algorithm B to the tp's only works if the Kraft inequality 
S(p,tp)eL 2~"'^''~'^*''^ < 1 is satisfied |[Kra49|| . This can be ensured by using prefix free 



(e.g. Shannon- Fano) codes ||Sha48| , [LV97|| . The number of steps to calculate tp/{x) is, by 
definition, timet^,{x). The relative computation time a available for computing tpr{x) 
is 10% ■ 2~'*^^'-'^'*^V). Hence, tp'(x) is computed and tjast < V(^) checked after time 
Tb <Ta + 10-2'*^^'-'+'*^V).tzrney (x). We have to add Ta, since B has to wait, in the worst 
case, time Ta before it can start executing tpi{x). 

C) If algorithm C halts, its construction guarantees that the output is correct. In the 
following, we show that C always halts, and give a bound for the computation time. 

i) Assume that algorithm C stops before B performed the check tpi{x) < t fasti because 
a different p already computed p{x). In this case Tc <Tb- 

ii) Assume that k = m C when B performs the check tpi{x) < tfast- Running-time 
Tb has passed until this point, hence k^ < 80%-Tb ■ Furthermore, assume that C 
halts in period fco because the program (different from p') executed in this period 
computes the result. In this case, Tc < ■^^2ko < 2Tb. 

Hi) If C does not halt in period ko but 2fco > tfast, then p'{x) has enough time to compute 
the solution in the next period k = 2ko, since timep'{x) < tfast < 4/co — 2kQ. Hence 
Tc < swAh < ATb. 

iv) Finally, if 2/co < tfast we "wait" for the period k> ko with |fc < tfast < k. In this 
period k, either p'{x), or an even faster algorithm, which has in the meantime been 
constructed by A and B, will be computed. In any case, the 2k — k > tfast steps are 
sufficient to compute the answer. We have 80% -Tc <2k < it fast < 4:tp>{x). 

The maximum of the cases (i) to (iv) bounds the computation time of C and, hence, of 
Mp, by 

timeMp,{,x) = Tc < max{4Te, 5tp(x)} < 4Tb + 5tp(x) < 
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< 5-tp(x) + dp-timetpix) + Cp 
dp = 40-2'(P)+'(*''), Cp = 40-2'(f™''^(f))+^-O(/(proo/(p)=') 

where we have dropped the prime from p. We have also suppressed the dependency of 
Cp and dp on p* {proof (p) depends on p* too), since we considered p* to be a fixed given 
algorithm. 



5 Assumptions on the Machine Model 



In the time analysis above we have assumed that program simulation with abort possibil- 
ity and scheduling parallel algorithms can be performed in real-time, i.e. without losing 
performance. Parallel computation can be avoided by sequentially performing all opera- 
tions for a limited time and then restarting all computations in a next cycle with double 
the time and so on. This will increase the computation time of A and B (but not of C\) 
by, at most, a factor of 4. Note that we use the same universal Turing machine U with 
the same underlying Turing machine model (number of heads, symbols, ...) for measur- 
ing computation time for all programs (strings) p, including Mp*. This prevents us from 
applying the linear speedup theorem (which is cheating somewhat anyway), but allows 
the possibility of designing a U which allows real-time simulation with abort possibility. 
Small additive "patching" constants can be absorbed in the 0{) notation of Cp. Details 
will be given elsewhere. 



6 Algorithmic Complexity 



Data compression is a very important issue in computer science. Saving space or channel 
capacity are obvious applications. A less obvious (but not far fetched) application is that 
of inductive inference in various forms (hypothesis testing, forecasting, classification, ...). 
A free interpretation of Occam's razor is that the shortest theory consistent with past 
data is the most likely to be correct. This has been put into a rigorous scheme by Pol64 



and proved to be optimal in ||Sol78| , |Hut99|| . Kolmogorov Complexity is a universal notion 
of the information content of a string ||Kol65| , |Cha66| , [ZL70|| . It is defined as the length of 



the shortest program computing string x. 



Kif{x) := min{/(p) : U{p) 



x} = K{x) + 0{1) 



where U is some universal Turing Machine. It can be shown that Ku{x) varies, at most, by 
an additive constant independent of x by varying the machine U. Hence, the Kolmogorov 
Complexity K{x) is universal in the sense that it is uniquely defined up to an additive 
constant. K{x) can be approximated from above (is co-enumerable), but not finitely 
computable. See ||LV97|| for an excellent introduction to Kolmogorov Complexity and 
[VLOOII for a review of Kolmogorov inspired prediction schemes. 
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Recently, Schmidhuber ||SchOO|| has generalized Kolmogorov complexity in various ways to 



the limits of computability and beyond. In the following, we also need a generalization, 
but of a different kind. We need a short description of a function, rather than a string. 
The following definition of the complexity of a function / 

K\f) := niin{/(p) : U{p,x) = /(x) Vx} 

seems natural, but suffers from not even being approximable. There exists no algorithm 
converging to K'{f), because it is undecidable, whether a program p is the shortest 
program equivalent to a function /. This is similar to the case of the fastest program. 
This is obvious if / is an abstract function. But even if we have a formal specification or 
program p* of /, K'{p*) is not approximable. Using K{p*) is not a suitable alternative, 
since K{p*) might be considerably longer than K'{p*), as in the former case all information 
contained in p* will be kept - even that which is functionally irrelevant (e.g. dead code). 
An alternative is to restrict ourselves to provably equivalent programs. The length of the 
shortest one is 

K"{p*) := min{/(p) : a proof of \\/y:u{p,y) = u{p*,y)] exists} 

It can be approximated from above, since the set of all programs provably equivalent to 
p* is enumerable. 

Having obtained, after some time, a very short description p' of p* for some purpose (e.g. 
for defining a prior probability for some inductive inference scheme), it is usually also nec- 
essary to obtain values for some arguments. We are now concerned with the computation 
time of p'. Could we get slower and slower algorithms by compressing p* more and more? 
Interestingly this is not the case. Inventing complex (long) programs is not necessary 
to construct asymptotically fast algorithms, under the stated provability assumptions, in 
contrast to Blum's Theorem |piu671 , piu71|] . The following theorem roughly says that 



there is a single program, which is the fastest and the shortest program. 



Theorem 2. Let p* be a given algorithm or formal speciGcation of a function. There 
exists a program p, provably equivalent to p* , for which the following holds 

i) lip) <K"{p*) + 0{l) 
a) timcp^x) < 5-tp{x) + dp-timetp{x) + Cp 

where p is any program provably equivalent to p* with computation time provably less 
than tp{x). The constants Cp and dp depend on p but not on x. 



To prove the theorem, we just insert the shortest algorithm p' provably equivalent to p* 
into M, that is p := Mp/. As only 0(1) instructions are needed to build Mp/ from p', Mpi 
has size l{p')+0{\) = K"{p*)+0{1). The computation time of Mp/ is the same as of Mp* 
apart from "slightly" different constants. 
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7 Generalizations 

If p* has to be evaluated repeatedly, algorithm A can be modified to remember its current 
state and continue operation for the next input {A is independent of x!). The large offset 
time Cp is only needed on the first call. 

Mp* can be modified to handle i/o streams, definable by a Turing machine with monotone 
input and output tapes (and bidirectional working tapes) receiving an input stream and 
producing an output stream. The currently read prefix of the input stream is x. timep{x) 
is the time used for reading x. Mp* caches the input and output streams, so that algorithm 
C can repeatedly read/write the streams for each new p. The true input /output tapes are 
used, when needing/producing a new symbol . Algorithm B is reset after 1, 2, 4, 8, ... steps 
(not after reading the next symbol of x!) to appropriately take into account increased 
prefixes x. Algorithms A just continues. The bound of Theorem 1 holds for this case too, 
with slightly increased dp. 

The construction above also works if time is measured in terms of the current output rather 
than the current input x. This measure is, for example, used for the time-complexity of 
calculating the n*^ digit of a computable real (e.g. it), where there is no input, but only 
an output stream. 



8 Summary & Conclusions 

We presented an algorithm Mp*, which accelerates the computation of a program p*. The 
central idea was to enumerate all programs p equivalent to p* by enumerating all proofs. 
Under certain constraints, Mp* is the asymptotically fastest algorithm for computing p* 
apart from a factor 5 in computation time. Blum's Theorem shows that the provability 
constraints are essential. We have shown that the conditions on Theorem 1 are often 
satisfied for practical problems, but not always, however. For complex approximation 
problems, for instance, where no good and fast time bound exists, Mp* is still optimal, but 
in this case, only apart from a large multiplicative factor. We briefly outlined how Mp* can 
be modified to handle i/o streams and other time-measures. An interesting consequence 
of Theorem 1 was that the fastest program computing a certain function is also one of 
the shortest programs provably computing this function. Looking for larger programs 
saves, at most, a finite number of computation steps, but cannot improve the time order. 
To quantify this statement, we extended the definition of Kolmogorov complexity and 
defined two new natural measures for the complexity of a function. The large constants 
Cp and dp seem to spoil a direct implementation of Mp/. On the other hand. Levin 
search has been successfully applied to solve rather difficult machine learning problems 
Sch97| , |SZW97|| , even though it suffers from a large multiplicative factor of similar origin. 



The use of more elaborate theorem-provers, rather than brute force enumeration of all 
proofs, could lead to smaller constants and bring M* closer to practical applications, 
possibly restricted to subclasses of problems. A more fascinating (and more speculative) 
way may be the utilization of so called transparent or holographic proofs | |Bi:''LS91| . Under 
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certain circumstances they allow an exponential speed up for checking proofs. This would 
reduce the constants Cp and dp to their logarithm, which is a small value. I would like 
to conclude with a general question. Will the ultimative search for asymptotically fastest 
programs typically lead to fast or slow programs for arguments of practical size? Levin 
search, matrix multiplication and the algorithm Mp* seem to support the latter, but this 
might be due to our inability to do better. 



Acknowledgements: Thanks to Monaldo Mastrolilli and Jiirgen Schmidhuber for en- 
lightening discussions and for proof-reading drafts. 
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